How Much Is Said in a Tweet? A Multilingual, Information-theoretic Perspective
نویسندگان
چکیده
This paper describes a multilingual study on how much information is contained in a single post of microblog text from Twitter in 26 different languages. In order to answer this question in a quantitative fashion, we take an information-theoretic approach, using entropy as our criterion for quantifying “how much is said” in a tweet. Our results find that, as expected, languages with larger character sets such as Chinese and Japanese contain more information per character than other languages. However, we also find that, somewhat surprisingly, information per character does not have a strong correlation with information per microblog post, as authors of microblog posts in languages with more information per character do not necessarily use all of the space alloted to them. Finally, we examine the relative importance of a number of factors that contribute to whether a language has more or less information content in each character or post, and also compare the information content of microblog text with more traditional text from
منابع مشابه
Combination of real options and game-theoretic approach in investment analysis
Investments in technology create a large amount of capital investments by major companies. Assessing such investment projects is identified as critical to the efficient assignment of resources. Viewing investment projects as real options, this paper expands a method for assessing technology investment decisions in the linkage existence of uncertainty and competition. It combines the game-theore...
متن کاملExploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study
This paper provides a linguistic and pragmatic analysis of the phenomenon of irony in order to represent how Twitter’s users exploit irony devices within their communication strategies for generating textual contents. We aim to measure the impact of a wide-range of pragmatic phenomena in the interpretation of irony, and to investigate how these phenomena interact with contexts local to the twee...
متن کاملA game Theoretic Approach to Pricing, Advertising and Collection Decisions adjustment in a closed-loop supply chain
This paper considers advertising, collection and pricing decisions simultaneously for a closed-loop supplychain(CLSC) with one manufacturer(he) and two retailers(she). A multiplicatively separable new demand function is proposed which influenced by pricing and advertising. In this paper, three well-known scenarios in the game theory including the Nash, Stackelberg and Cooperative games are expl...
متن کاملSurvey of companions of cancer patients about the need and how to express getting incurable cancer
Introduction: Expressing bad news in medicine is one of the most important measures taken by medical staff that should be given to patients in special circumstances that it is necessary to examine the views of companions and patients in this regard. Therefore, the aim of this study was to investigate the necessity and manner of expressing bad news (incurable cancer) from the perspective of canc...
متن کاملTwitter and Privacy What can be mined
This paper will describe how private social media messages are these days, and how accurate the information that is gathered from this large amount of data is. However, there is no detailed literature review available that covers the different aspects of privacy of social media content. In this paper we focus on Twitter messages by first performing a literature review covering related work all ...
متن کامل